DigitHist: a Histogram-Based Data Summary with Tight Error Bounds

نویسندگان

  • Michael Shekelyan
  • Anton Dignös
  • Johann Gamper
چکیده

We propose DigitHist, a histogram summary for selectivity estimation on multi-dimensional data with tight error bounds. By combining multi-dimensional and one-dimensional histograms along regular grids of different resolutions, DigitHist provides an accurate and reliable histogram approach for multi-dimensional data. To achieve a compact summary, we use a sparse representation combined with a novel histogram compression technique that chooses a higher resolution in dense regions and a lower resolution elsewhere. For the construction of DigitHist, we propose a new error measure, termed u-error, which minimizes the width between the guaranteed upper and lower bounds of the selectivity estimate. The construction algorithm performs a single data scan and has linear time complexity. An in-depth experimental evaluation shows that DigitHist delivers superior precision and error bounds than state-of-the-art competitors at a comparable query time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Identification of Smart Foam Using Set Mem-bership Estimation in A Model Error Modeling Frame-work

The aim of this paper is robust identification of smart foam, as an electroacoustic transducer, considering unmodeled dynamics due to nonlinearities in behaviour at low frequencies and measurement noise at high frequencies as existent uncertainties. Set membership estimation combined with model error modelling technique is used where the approach is based on worst case scenario with unknown but...

متن کامل

Steganalysis Method for LSB Replacement Based on Local Gradient of Image Histogram

In this paper we present a new accurate steganalysis method for the LSBreplacement steganography. The suggested method is based on the changes that occur in thehistogram of an image after the embedding of data. Every pair of neighboring bins of ahistogram are either inter-related or unrelated depending on whether embedding of a bit ofdata in the image could affect both bins or not. We show that...

متن کامل

GPS/INS Integration for Vehicle Navigation based on INS Error Analysis in Kalman Filtering

The Global Positioning System (GPS) and an Inertial Navigation System (INS) are two basic navigation systems. Due to their complementary characters in many aspects, a GPS/INS integrated navigation system has been a hot research topic in the recent decade. The Micro Electrical Mechanical Sensors (MEMS) successfully solved the problems of price, size and weight with the traditional INS. Therefore...

متن کامل

Generalization Error Bounds Using Unlabeled Data

We present two new methods for obtaining generalization error bounds in a semi-supervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM principle can be refined using unlabeled data and has provable optimality guarantees when the number of unlabele...

متن کامل

Estimation of mutual information by the fuzzy histogram

Abstract Mutual Information (MI) is an important dependency measure between random variables, due to its tight connection with information theory. It has numerous applications, both in theory and practice. However, when employed in practice, it is often necessary to estimate the MI from available data. There are several methods to approximate the MI, but arguably one of the simplest and most wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2017